A place-based approach to two-mode relational data

Unlocking bipartite networks with the “Places” R package

Cécile Armand (Ecole Normale Supérieure de Lyon, Laboratoire de Recherches Historiques Rhône-Alpes)

featured

Abstract: This document presents an effective approach for handling two-mode networks, utilizing the concept of ‘place’ or structural equivalence as its fundamental framework. It primarily relies on the ‘Places’ and ‘igraph’ R packages. To illustrate this method, it employs an edge list representing students and their respective universities in the United States. The data source for this analysis is derived from the directory of an alumni club, specifically the American University Club of Shanghai, which was originally published in 1936. The document proceeds through four main steps: (1) identification of places from the edge list, (2) transformation of the list of places into a network of places, along with its transposed network of universities, (3) visualization and analysis of the network, including community detection, and (4) the introduction of a more flexible approach grounded in the concepts of regular equivalence or k-places.

Prerequisites: Basic notions of network analysis and the “tidyverse” suite.

Introduction

0.1 Context

Two-mode networks1, i.e. networks that involve two different types of nodes, such as persons and organizations, represent a significant proportion of network analysis research in the humanities and social sciences. Indeed, it is not always possible to gather first-hand data on direct relationships, such as friendship or family ties. In many situations, social relations are mediated by a third party or have to be inferred from indirect ties, such as school attendance, co-participation in events, membership in clubs or corporate boards.

Analyzing two-mode networks raises significant challenges, which have been extensively described in specialized literature (Borgatti 2009). Two major approaches have been commonly deployed. The first approach, which applies algorithms developed for one-mode networks, disregards the unique characteristics of two-mode data and introduces biases that have been discussed in previous works (Borgatti, Everett 1997). The second approach involves projecting the original two-mode network into two separate one-mode networks (Everett, Borgatti 2013). Depending n their interest, researchers typically focus on one projection and discard the other. However, this method has been shown to result in a loss of information and the creation of artificial clustering, which can introduce biases in the interpretation of the data (Newman et al. 2001), (Uzzi, Spiro 2005), (Zhou et al. 2007).

The place-based methodology we aim to introduce in this document offers a powerful alternative to the two mainstream approaches described above, as illustrated on Figure 1. First, it allows for a reduction of the network without sacrificing information. Second, it maintains the inherent duality property found in two-mode networks (Field et al. 2006).

Figure 1 - The place-based methodology

Figure 1 - The place-based methodology

It is important to emphasize that the notion of place2 in this context should not be understood in a geographical sense. Instead, it draws inspiration from the concept of structural equivalence3 in network analysis, originally introduced by sociologist Narciso Pizarro (Pizarro 2002), (Pizarro 2007). In the context of individuals affiliated to certain institutions, each place refers to a group of individuals who share the exact same set of institutions. In other words, individuals belong to the same place if they are affiliated to the same institution or combination of institutions.

While popular software tools like Gephi and Cytoscape do not provide built-in functions for place-based analysis, researchers can resort to the Places R package developed by Delio de Lucena at Science-Po Toulouse. To the best of our knowledge, this package is the only available library that enables place detection and analysis.

0.2 Packages

This document relies on the following packages:

  • tidyverse: A standard suite of useful functions and packages to manipulate data in a tidy format (link)
  • Places : A package specifically designed to find places in two-mode data. This package has been developed by Delio de Lucena (Science-Po Toulouse).
  • igraph: A reference package for building, analyzing and visualizing networks.
  • kableExtra is used to enhance the display of dataframes and make the data more legible.

0.3 Data

The example data used in this tutorial was created by the author from a directory of the American University Club of Shanghai published in 1936 (Shanghai 1936). The original dataset can be downloaded from Zenodo. It is freely accessible and open for reuse. In this tutorial, we use a simplified version of the original dataset, which we describe below.

The data is typically an edge list5 of individuals linked to the universities they attended. It contains 682 academic curricula distributed among the 418 members of the American University Club of Shanghai. Since the individuals may have obtained several degrees from different universities, they may appear in several rows. Each row refers to a distinct curriculum.

The dataset includes the following variables (columns):

  • Name: Individual’s name
  • Nationality: Individual’s national origin (Chinese, Western, Japanese)
  • University: Name of the university attended
  • Degree: Nature of academic degree obtained
  • Field: Major field of study
  • Start_year: Year of enrollment or graduation
  • End_year: Year of graduation

To load the data, run the following line:

library(readr)

auc <- read_delim("data/auc.csv", delim = ";", escape_double = FALSE, col_types = cols(Nationality = col_factor(levels = c("Chinese", "Japanese", "Western")), Start_year = col_number(), End_year = col_number()), trim_ws = TRUE)

head(auc)
# A tibble: 6 × 7
  Name         Nationality University     Degree    Field    Start_year End_year
  <chr>        <fct>       <chr>          <chr>     <chr>         <dbl>    <dbl>
1 Ting_H.N.    Chinese     Pennsylvania   Bachelor  Arts           1915     1918
2 Inui_Kiyosue Japanese    Michigan       Bachelor  Arts           1906     1906
3 Inui_Kiyosue Japanese    Tokyo Imperial Doctorate Law            1897     1901
4 Yu_Leo W.    Chinese     Purdue         <NA>      <NA>           1925     1926
5 Yu_Leo W.    Chinese     Nebraska       Bachelor  Electri…       1925     1925
6 Yu_Leo W.    Chinese     Nevada         <NA>      <NA>           1922     1923

The names() function lists the variables and the summary() function provides a summary description of the dataset:

names(auc)
[1] "Name"        "Nationality" "University"  "Degree"      "Field"      
[6] "Start_year"  "End_year"   
summary(auc)
     Name             Nationality   University           Degree         
 Length:682         Chinese :401   Length:682         Length:682        
 Class :character   Japanese:  6   Class :character   Class :character  
 Mode  :character   Western :275   Mode  :character   Mode  :character  
                                                                        
                                                                        
                                                                        
                                                                        
    Field             Start_year      End_year   
 Length:682         Min.   :1883   Min.   :1883  
 Class :character   1st Qu.:1914   1st Qu.:1915  
 Mode  :character   Median :1920   Median :1921  
                    Mean   :1920   Mean   :1920  
                    3rd Qu.:1926   3rd Qu.:1926  
                    Max.   :1935   Max.   :1935  
                    NA's   :2      NA's   :1     

0.4 Workflow

Figure 2 presents a tentative workflow for developing an effective place-based methodology, which comprises essential and optional modules. In this tutorial, we will focus on:

  1. Detecting and analyzing places from two-mode data (2 and 3 on Figure 2)
  2. Creating dual networks of places and sets from the detected places (4)
  3. Basic network analysis and visualization (5)
  4. Detecting communities in the dual network (6)
  5. A brief introduction to regular equivalence and the k-places function.
Figure 2 - Standard workflow for a place-based analysis ([interactive version](https://xmind.app/mindmap/places-in-two-mode-data-a-workflow/YX2g4H/?from=gallery#))

Figure 2 - Standard workflow for a place-based analysis (interactive version)

1 Place detection

The first section aims at detecting and analyzing places from our dataset of students and universities.

First, we need to install the “Places” package from the author’s repository:

install.packages("http://lereps.sciencespo-toulouse.fr/IMG/gz/places_0.2.3.tar.gz", repos = NULL, type = "source")
library(Places)
class(auc)
[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 
auc <- as.data.frame(auc)

We can now apply the function place(). The place() function is the key function to detect places. The function is made up of three arguments:

  • data: serves to specify the input dataset (e.g., the edge list of students and universities, “auc”). The input data must be in a two-mode edge list format.
  • col.elements: to select the source column, designated as “Elements” in the Places package terminology (in this specific case, the students)
  • col.sets: to select the target, designated as “Sets” (i.e., the universities attended by the students).
Result1 <- places(data = auc, col.elements = "Name", col.sets = "University")
Cleaning data ... rows with empty cells and NAs will be removed
Rows removed: 0
Cleaning data ... duplicate rows will be removed
Duplicate rows removed: 72
There are 418 elements and 146 sets
Working ...
A total of 223 places have been identified


As indicated in the console, 223 unique places were found from the initial dataset of 418 students (Elements) and 146 universities (Sets).

It is possible to make the code shorter by skipping the argument:

Result1 <- places(auc, "Name", "University") 


The place function returns a list object which contains three data frames:

  1. The original two columns data frame and the column “Places” with places labels
  2. A data frame containing information about places
  3. The network of places in a two-mode edgelist format

The data frame containing information about places includes the following features: * PlaceNumber:contains the number of the place, ordered from the highest to the lowest number of sets. * PlaceLabel: the place number, and within parentheses, the number of element and sets it contains. Labels start with P, followed by the place number, the number of elements in place and the number of sets defining the place. * NbElements: the number of elements (students) contained in the place * NbSets: the number of sets (universities) in the place * PlaceDetail : contains the name of all the elements in the place and all the sets defining the place.

To enable further manipulation, we extract the key information in a data frame format:

Result1_df <- as.data.frame(Result1$PlacesData) 


We can use “kableExtra” to enhance the table and make the data more legible. Only the 6 first rows are displayed below:

library(kableExtra)

kable(head(Result1_df), caption = "First 6 places") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
First 6 places
PlaceNumber PlaceLabel NbElements NbSets PlaceDetail
1 P001(1-4) 1 4 {Lacy_Carleton} - {Columbia;Garrett Biblical Institute;Northwestern;Ohio Wesleyan}
2 P002(1-4) 1 4 {Luccock_Emory W.} - {McCormick Seminary;Northwestern;Wabash;Wooster}
3 P003(1-4) 1 4 {Ly_J .Usang} - {Columbia;Haverford;New York University;Pennsylvania}
4 P004(1-4) 1 4 {Pott_Francis L. Hawks} - {Columbia;General Theological Seminary;Trinity;University of Edinburgh}
5 P005(1-3) 1 3 {Chu_Fred M.C.} - {Chicago;Pratt Institute;Y.M.C.A. College}
6 P006(1-3) 1 3 {Chung_Elbert} - {Georgetown;Pennsylvania;Southern California}


In the following section, we will conduct a more in-depth examination of the attributes associated with places.

1.1 Places attributes

We first explore how the students (Elements) are distributed among places using the table() and hist() function in R base:

table(Result1_df$NbElements)

  1   2   3   4   5   7  10  11  12  15  16  18 
179  15  12   4   1   3   1   1   1   2   2   2 
hist(Result1_df$NbElements, main = "Students by place")


The table and histogram reveal that most places (179, 80%) consist of unique trajectories focused on a single student. These places perfectly identified with the students.

Similarly, we can explore the distribution the universities (Sets) among places:

table(Result1_df$NbSets)

  1   2   3   4 
 79 119  21   4 
hist(Result1_df$NbSets, main = "Universities by place")

Most places contain a maximum of two universities, which means that the majority of students attended a maximum of two different universities. This suggests that many students were relatively mobile and transferred to a different institution to complete their training. Fewer students (25) attended more than two universities during their studies. 21 students attended 3 and 4 students attended 4 different universities.

1.2 Most significant places

Beyond crude statistics, we want to know more about the students and the universities which define each place. The “Place Detail” column provides such information. Since it would be time-consuming to examine the 223 places one by one, we start by focusing on the most significant places which include a minimum of 2 students and 2 colleges. Using the filter() function, 13 such places are found:

n2 <- Result1_df %>% 
  filter(NbElements >1 & NbSets>1)

kable(n2, caption = "The 13 most significant places") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
The 13 most significant places
PlaceNumber PlaceLabel NbElements NbSets PlaceDetail
26 P026(4-2) 4 2 {Chu_Percy;Lee_Alfred S.;Liang_Louis K.L.;Sun_J.H.} - {Columbia;New York University}
27 P027(3-2) 3 2 {Au_Silwing P.C.;Yee_S.K.;Zee_Andrew} - {Chicago;Michigan}
28 P028(3-2) 3 2 {Chang_Ting-Chin;Hsueh_Wei Fan;Wong_Tse-Kong} - {Ohio State;Pennsylvania}
29 P029(3-2) 3 2 {Ho_Teh-Kuei;Sze_F.C.;Tsai_Thomas Wen-hsi} - {Harvard;Wisconsin}
30 P030(3-2) 3 2 {Huang_H.L.;Wang_K.P.;Welles_Henry H.} - {Columbia;Princeton}
31 P031(2-2) 2 2 {Chen_Kwan-Pu;Wong_I.K.} - {Pennsylvania;St. John’s University}
32 P032(2-2) 2 2 {Jen_Lemuel C.C.;West_Eric Ralph} - {California;George Washington}
33 P033(2-2) 2 2 {Lee_Shee-Mou;Parker_Frederick A.} - {Harvard;Massachusetts Institute of Technology}
34 P034(2-2) 2 2 {Lin_Peter Wei;Ma_Y.C.} - {Columbia;Yale}
35 P035(2-2) 2 2 {Lum_Joe W.;Wu_Jack Foy} - {Columbia;Stanford}
36 P036(2-2) 2 2 {Ngao_Sz-Chow;Speery_Henry M.} - {Columbia;Michigan}
37 P037(2-2) 2 2 {Sze_Ying Tse-yu;Zhen_M.S.} - {Columbia;Massachusetts Institute of Technology}
38 P038(2-2) 2 2 {Tsao_Y.S.;Yen_Fu-ching} - {Harvard;Yale}


At this stage, it is recommended to carefully examine the list of places and their details, starting from the most important and gradually expanding the selection to include less populated places.

1.3 Typology of places

If the data includes qualitative attributes, these attributes can be used to further characterize the places and build of typology. In our example, for instance, we considered the students’ field of study and the time of graduation to establish the relative strength of places, as shown in the table below:

Period of study SAME TIME DIFFERENT TIME
Field of study
SAME DISCIPLINE TYPE A : Strong potential for regular interaction (4 places, 9%) TYPE C : Potential for later collaboration (7 places, 16%)
DIFFERENT DISCIPLINE TYPE B: Potential for extra-curricula interaction (8 places, 18%) TYPE D : Shared academic experience and cultural background (25 places, 32%)

This part is not directly reproducible because it depends on the intrinsic qualities of the dataset. Nevertheless, it is worth mentioning because the rationale can be adapted to other data and research questions.

In the next section, we will see how we can build and analyze networks of places to further investigate the structure and dynamics of Sino-American alumni networks (in this specific case).

2 From places to networks

2.1 Create networks

The results of the place() function also include an edge list of places linked by sets (designated as “Edgelist”). We can take advantage of this list to build a network of places linked by universities (Sets) and the transposed network of universities (Sets) linked by places.

To build a network of places linked by universities, we start by created an adjacency matrix from the edgelist:

bimod<-table(Result1$Edgelist$Places, Result1$Edgelist$Set) 
PlacesMatrix<-bimod %*% t(bimod)
diag(PlacesMatrix)<-0 


Next, we use the graph_from_adjacency_matrix function included in the igraph package to transform the matrix into a network of places linked by universities (Net1):

library(igraph)
Net1 <-graph_from_adjacency_matrix(PlacesMatrix, mode="undirected", weighted = TRUE)


We apply the same method for building the transposed network of universities linked by places (Net2) :

bimod2<-table(Result1$Edgelist$Set, Result1$Edgelist$Places)
PlacesMat2<-bimod2 %*% t(bimod2)
diag(PlacesMat2)<-0

Net2<-graph_from_adjacency_matrix(PlacesMat2, mode="undirected", weighted = TRUE)
# Convert igraph objects into edge lists (not run in this session)
  # edgelist1 <- as_edgelist(Net1)
  # edgelist2 <- as_edgelist(Net2)
# Export edge lists and node lists as csv files (not run in this session)
  # write.csv(edgelist1, "edgelist1.csv")
  # write.csv(Result1_df, "nodelist1.csv")
  # write.csv(edgelist2, "edgelist2.csv")

2.2 Visualize networks

Let’s plot the network of places linked by universities:

plot(Net1, vertex.size = 5, 
     vertex.color = "orange", 
     vertex.label.color = "black", 
     vertex.label.cex = 0.3, 
     main="Network of places linked by universities")


plot(Net2, vertex.size = 5, 
     vertex.color = "light blue", 
     vertex.label.color = "black", 
     vertex.label.cex = 0.3, 
     main="Network of universities linked by places")

The plot function from the igraph package includes various arguments. The first argument is required, as it specifies the network object to be plotted. The other arguments are optional. In the above example, we specified the following arguments:

  • vertex.size: size of vertices (or nodes).
  • vertex.color: color of vertices.
  • vertex.label.color: color of vertices labels.
  • vertex.label.cex: size of vertices labels.
  • main: title for the graph.

As evident from the graphs, the two networks are each made up of a large, densely connected component surrounded by a myriad of isolated nodes and smaller components. The latter refer to the singular curricula described in the previous section. To substantiate this preliminary visual exploration, it is crucial to turn to network metrics.

2.3 Analyze networks

In network analysis, we usually distinguishes between global metrics, which serve to characterize the overall structure of the network, and local metrics, which characterize the vertices and their position in the network.

2.3.1 Global metrics

There are many metrics to define the structure of networks. In the following, we focus on the most basic ones, which can be computed with the following functions from the igraph package:

  • summary: provides summary statistics on the network (nature of the network, number of vertices and ties, attributes if applicable)
  • graph.density: density of the graph
  • no.clusters: number of components
  • clusters$size: size of components
  • table(E(Pla1Net)$weight): table of edge weight
summary(Net1)
IGRAPH d508280 UNW- 223 1606 -- 
+ attr: name (v/c), weight (e/n)
summary(Net2)
IGRAPH 68c78ee UNW- 146 197 -- 
+ attr: name (v/c), weight (e/n)


The results indicate that the two networks are undirected, weighted network (UNW). The network of places linked by universities (Net1) contains 223 vertices (places) and 1606 ties (universities). The network of universities linked by places includes 146 vertices (universities) and 197 ties (places). The name of vertices is the only attribute.

graph.density(Net1)
[1] 0.06488102
graph.density(Net2)
[1] 0.01861124


Graph density is useful mostly when comparing different networks. In our example, the network of places linked by universities is denser than the network of universities linked by places.

no.clusters(Net1)
[1] 39
no.clusters(Net2)
[1] 39


The two networks comprise 39 components each. This property illustrates the duality of two-mode networks.

clusters(Net1)$csize
 [1] 184   1   2   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[20]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[39]   1
clusters(Net2)$csize
 [1]   1   1 105   2   1   1   1   1   1   1   1   1   1   1   1   3   1   1   1
[20]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[39]   1


The tables show the size of the components. The largest component includes 184 vertices. The remaining components consist of one dyad and a myriad of isolated vertices. The largest component in the network of universities includes 105 vertices. The other components includes a triad, a dyad, and myriad of isolated vertices.

Since the networks are weighted, it is interesting to examine the relative weight of ties by simply using the table function in base R:

table(E(Net1)$weight)

   1    2 
1594   12 
table(E(Net2)$weight)

  1   2   4 
190   6   1 


The network of places includes a majority of simple ties (1594) and 12 ties with a weight equal to 2. These 12 ties refer to pairs of places (academic trajectories) which shared 2 sets (universities). The network of universities also include a majority of simple ties (190) but also 6 ties of weight 2 (6 pairs of universities which shared 2 places) and 1 tie of weight 4 (one pair of university which shared 4 places).

Let’s find out what are these most significant pairs by simply filtering the ties whose weight is superior to 1:

E(Net1)[weight > 1]
+ 12/1606 edges from d508280 (vertex names):
 [1] P003(1-4)--P016(1-3) P003(1-4)--P018(1-3) P003(1-4)--P026(4-2)
 [4] P003(1-4)--P121(1-2) P007(1-3)--P019(1-3) P007(1-3)--P085(1-2)
 [7] P015(1-3)--P031(2-2) P015(1-3)--P052(1-2) P016(1-3)--P018(1-3)
[10] P016(1-3)--P026(4-2) P017(1-3)--P072(1-2) P018(1-3)--P026(4-2)
E(Net2)[weight == 2]
+ 6/197 edges from 68c78ee (vertex names):
[1] California         --Columbia             
[2] Chicago            --Columbia             
[3] Columbia           --Pomona               
[4] Hawaii             --Pennsylvania         
[5] New York University--Pennsylvania         
[6] Pennsylvania       --St. John's University
E(Net2)[weight == 4]
+ 1/197 edge from 68c78ee (vertex names):
[1] Columbia--New York University


The results indicate that the most frequent circulations occurred between Columbia and New York University, which largely reflects their geographical proximity, as both universities are located in New York City. Other important links existed between more distant universities, such as California and Columbia, or Hawaii and the University of Pennsylvania, which suggests that physical proximity was not the only factor accounting for the students’ mobility. Further investigation is required to understand the logic underlying these strong ties, but one advantage of network analysis is to point out connections that would otherwise remain unnoticed.

2.3.2 Local metrics

There are many metrics to measure the relative position of vertices in networks. In the following, we focus on the most popular centrality metrics, which can all be computed with the igraph package:

  • Degree: the number of ties a node has. It is the simplest measure of centrality. In the following, we use a normalized version of the measure in order enable comparisons across networks built from different data structure.
  • Eigenvector: the number of connections a node has to other well-connected nodes. It is a measure of the influence of a node in a network.
  • Betweenness: the number of times a node acts as a bridge along the shortest path between two other nodes. In this sense, the more central a node is, the greater control it has over the flows that goes through it. It is often considered as a measure of brokerage, or the capacity of a node to mediate between other nodes.
  • Closeness: the average length of the shortest path between the node and all other nodes in the graph. In this sense, the more central a node is, the closer it is to all other nodes.

Since betweeneess and closeness centralities make sense only in fully connected network, we first need to extract the main component in each network, using the induced.subgraph function included in igraph. First, we verify the id number of largest component in each network. The largest component in Net1 is component n°1, whereas it is n°3 in Net2:

# clusters(Net1) (not run in this session)
# clusters(Net2) (not run in this session)


Next, we extract these components:

Net1MC <- induced.subgraph(Net1,vids=clusters(Net1)$membership==1)
Net2MC <- induced.subgraph(Net2,vids=clusters(Net2)$membership==3)


The following code serves to compute the centrality metrics in the network of universities and compile them in a coherent data frame. We chose to normalize degree centrality to facilitate comparisons with other metrics and across networks.

Degree2 <- degree(Net2MC, normalized = TRUE) 
Eig2 <- evcent(Net2MC)$vector 
Betw2 <- betweenness(Net2MC)
Close2 <- closeness(Net2MC)
univ_metrics <- cbind(Degree2, Eig2, Betw2, Close2) 
univ_metrics_df <- as.data.frame(univ_metrics)

head(univ_metrics_df %>% arrange(desc(Degree2)))
                      Degree2      Eig2     Betw2      Close2
Columbia            0.3750000 1.0000000 2191.8655 0.005000000
Chicago             0.1923077 0.4721450 1338.0047 0.004149378
Pennsylvania        0.1923077 0.5089454  695.0678 0.004149378
Harvard             0.1730769 0.3653417 1020.2555 0.004629630
California          0.1346154 0.4282910  476.5158 0.004000000
New York University 0.1153846 0.6514840  343.6654 0.003717472


The table presents the 6 first universities ranked by degree centrality. Columbia University clearly stands out, which means that it attracted the larger number of students. Columbia also shows the highest eigenvector centrality, meaning it was connected to other important universities, and a high betweenness centrality score, meaning that it serves as an important bridge in the academic network.

We can visualize the relative importance of universities in the network by indexing the size of vertices on their centrality metrics. In the following example, we chose to make the size of vertices proportionate to their degree centrality:

V(Net2MC)$size <- degree(Net2MC)

plot(Net2MC,
     vertex.color="light blue",
     vertex.shape = "circle",
     vertex.size = V(Net2MC)$size/2, 
     vertex.label.color = "black", 
     vertex.label.cex = V(Net2MC)$size/100, 
     main="Network of universities",
     sub = "The size of vertices represents their degree centrality")


Based on this initial investigation, it becomes evident that the networks of Sino-American alumni exhibited significant heterogeneity. In the upcoming section, we will delve into the application of community detection techniques to identify subgroups of more densely connected vertices within the two network structures.

3 Community detection

The purpose of this section is twofold :

  • Substantively, to understand how academic communities took shape through the interconnection of students’ trajectories (in this specific example, which can be transposed to different data and research questions).
  • Methodologically, to illustrate the duality of place-based networks and to demonstrate the value of jointly analyzing the network of places (elements) and its transposed network of universities (sets).

The igraph package offers various methods for detecting communities. In this document, we chose the Louvain algorithm (Blondel et al. 2008), which is one of the most popular methods for finding communities in large networks.

3.1 Find communities

To detect communities with the Louvain algorithm, we apply the cluster_louvain() function included in igraph. We continue to focus on the main component to avoid the detection of artificial clusters made up of only one vertice:

set.seed(2024)
lvc1 <- cluster_louvain(Net1MC)
lvc2 <- cluster_louvain(Net2MC)


Let’s inspect the results:

print(lvc1)
IGRAPH clustering multi level, groups: 7, mod: 0.52
+ groups:
  $`1`
   [1] "P001(1-4)"  "P004(1-4)"  "P007(1-3)"  "P012(1-3)"  "P014(1-3)" 
   [6] "P016(1-3)"  "P017(1-3)"  "P018(1-3)"  "P019(1-3)"  "P026(4-2)" 
  [11] "P030(3-2)"  "P034(2-2)"  "P035(2-2)"  "P036(2-2)"  "P037(2-2)" 
  [16] "P048(1-2)"  "P056(1-2)"  "P061(1-2)"  "P070(1-2)"  "P072(1-2)" 
  [21] "P085(1-2)"  "P088(1-2)"  "P091(1-2)"  "P098(1-2)"  "P101(1-2)" 
  [26] "P114(1-2)"  "P119(1-2)"  "P128(1-2)"  "P132(1-2)"  "P136(1-2)" 
  [31] "P138(1-2)"  "P149(15-1)" "P169(2-1)"  "P180(1-1)"  "P221(1-1)" 
  
  $`2`
  + ... omitted several groups/vertices
print(lvc2)
IGRAPH clustering multi level, groups: 9, mod: 0.49
+ groups:
  $`1`
   [1] "Antioch"                            "Cooper"                            
   [3] "Hawaii"                             "Illinois"                          
   [5] "Iowa"                               "Louisiana"                         
   [7] "Massachusetts Agricultural College" "New York State"                    
   [9] "Oberlin"                            "Pennsylvania"                      
  [11] "Reed"                               "St. John's University"             
  [13] "Swarthmore"                         "Temple"                            
  [15] "Vanderbilt"                         "Wesleyan"                          
  [17] "Y.M.C.A. Graduate School"           "Yale"                              
  + ... omitted several groups/vertices


The algorithm found 7 communities of places and 9 communities of universities. The modularity scores (mod.)6 are relatively satisfactory (0.52 and 0.49, respectively). As shown in the tables below, the size of communities ranges from 10 to 42 vertices in the network of places, and from 6 to 21 vertices in the network of universities.

table(sizes(lvc1))

10 20 25 26 35 42 
 1  1  1  2  1  1 
table(sizes(lvc2))

 6  8 10 14 18 21 
 2  2  1  2  1  1 


In the next section, we will show how to visualize the communities.

3.2 Plot communities

3.2.1 Network of places

First, we create a group for each community and we set a different color for each group:

V(Net1MC)$group <- lvc1$membership
V(Net1MC)$color <- lvc1$membership 


Next, we plot the communities using the plot function from igraph:

plot(lvc1, Net1MC, vertex.label=V(Net1MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = 0.5, 
     vertex.size=1.8,
     main="Communities of places", 
     sub = "Louvain method")


Black ties link vertices within each group. Red ties link vertices across different communities.

3.2.2 Network of universities

Similarly, we create a group for each community of universities and we set a different color for each group:

V(Net2MC)$group <- lvc2$membership  # create a group for each community
V(Net2MC)$color <- lvc2$membership # node color reflects group membership 


Next, we plot the communities using the plot function from igraph:

plot(lvc2, Net2MC, vertex.label=V(Net2MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = 0.5, 
     vertex.size=3,
     main="Communities of universities", 
     sub = "Louvain method")


We need to acknowledge that these visualizations contain an overwhelming amount of information, which impose significant limitations on their practical utility. To facilitate a meaningful interpretation of the results, it is advisable to extract and scrutinize each community separately.

3.3 Extract communities

The following code serves to retrieve the membership information contained in the results of community detection (lvc1$membership) as a coherent data frame. Additionally, we compute the size of communities and we join this data with the detailed description of the places.

place_clusters <- data.frame(lvc1$membership,
                          lvc1$names) %>% 
  group_by(lvc1.membership) %>% 
  add_tally() %>% # add size of clusters
  rename(PlaceLabel = lvc1.names, cluster_no = lvc1.membership, cluster_size = n) %>%
  select(cluster_no, cluster_size, PlaceLabel)

place_clusters <- inner_join(place_clusters, Result1_df, by = "PlaceLabel") 


kable(head(place_clusters), caption = "Communities of places (6 first places)") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
Communities of places (6 first places)
cluster_no cluster_size PlaceLabel PlaceNumber NbElements NbSets PlaceDetail
1 35 P001(1-4) 1 1 4 {Lacy_Carleton} - {Columbia;Garrett Biblical Institute;Northwestern;Ohio Wesleyan}
2 25 P002(1-4) 2 1 4 {Luccock_Emory W.} - {McCormick Seminary;Northwestern;Wabash;Wooster}
3 26 P003(1-4) 3 1 4 {Ly_J .Usang} - {Columbia;Haverford;New York University;Pennsylvania}
1 35 P004(1-4) 4 1 4 {Pott_Francis L. Hawks} - {Columbia;General Theological Seminary;Trinity;University of Edinburgh}
4 20 P005(1-3) 5 1 3 {Chu_Fred M.C.} - {Chicago;Pratt Institute;Y.M.C.A. College}
3 26 P006(1-3) 6 1 3 {Chung_Elbert} - {Georgetown;Pennsylvania;Southern California}


We follow the same method for extracting community membership in the network of universities:

univ_clusters <- data.frame(lvc2$membership,
                        lvc2$names)  %>% 
  group_by(lvc2.membership) %>%  
  add_tally() %>% # add size of clusters
  rename(University = lvc2.names, cluster_no = lvc2.membership, 
         cluster_size = n) %>%
  select(cluster_no, cluster_size, University)


kable(head(univ_clusters), caption = "Communities of universities (6 first places)") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
Communities of universities (6 first places)
cluster_no cluster_size University
1 18 Antioch
2 14 Arizona
2 14 Beloit
3 10 Brown
4 21 Bucknell
4 21 Butler


In the next steps, we extract the communities of places as individual graphs:

gp1 <- induced_subgraph(Net1MC, V(Net1MC)$group==1)  
gp2 <- induced_subgraph(Net1MC, V(Net1MC)$group==2) 
gp3 <- induced_subgraph(Net1MC, V(Net1MC)$group==3) 
gp4 <- induced_subgraph(Net1MC, V(Net1MC)$group==4) 
gp5 <- induced_subgraph(Net1MC, V(Net1MC)$group==5) 
gp6 <- induced_subgraph(Net1MC, V(Net1MC)$group==6)
gp7 <- induced_subgraph(Net1MC, V(Net1MC)$group==7)


Similarly, we extract the communities of universities:

gu1 <- induced_subgraph(Net2MC, V(Net2MC)$group==1)  
gu2 <- induced_subgraph(Net2MC, V(Net2MC)$group==2) 
gu3 <- induced_subgraph(Net2MC, V(Net2MC)$group==3) 
gu4 <- induced_subgraph(Net2MC, V(Net2MC)$group==4) 
gu5 <- induced_subgraph(Net2MC, V(Net2MC)$group==5) 
gu6 <- induced_subgraph(Net2MC, V(Net2MC)$group==6)
gu7 <- induced_subgraph(Net2MC, V(Net2MC)$group==7)
gu8 <- induced_subgraph(Net2MC, V(Net2MC)$group==8)
gu9 <- induced_subgraph(Net2MC, V(Net2MC)$group==9)


To illustrate the duality of place-based networks, we will plot the corresponding communities of places and universities to visually compare their structure.

3.4 Visual comparisons

As an example, we will compare the Columbia-centered community and the corresponding community of places:

plot(gp1, vertex.label=V(Net1MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = 0.5, 
     vertex.size= 5,
     main="Columbia community (places)")

plot(gu4, vertex.label=V(Net2MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = degree(gu4)*0.15, 
     vertex.size= degree(gu4)*1.5,
     main="Columbia community (universities)")


The hairball structure of the community of places is transposed into the star-like structure of the community of universities.

The Princeton community presents a different, chain-type structure:

plot(gp2, vertex.label=V(Net1MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = 0.5, 
     vertex.size= 5,
     main="Princeton Community (places)")

plot(gu6, vertex.label=V(Net2MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = degree(gu6)*0.15, 
     vertex.size= degree(gu6)*1.5, # node size proportionate to node degree (in cluster)
     main="Princeton community (universities)")

4 Regular equivalence (k-places)

The final section introduces the notion of regular equivalence7 as a more flexible approach to places or structural equivalence.

The Places package includes a k-places() function which is specifically designed to identify regular equivalence patterns within two-mode networks. The k-places() function is very similar to the place() function. It includes four main arguments:

  • data: the input data frame (auc)
  • col.elements: the name of the column of elements (e.g., students)
  • col.sets: the name of the column of sets (e.g., universities)
  • k:a natural number that indicates the tolerance threshold.

In the following example, we set k = 1, meaning that we tolerate only one difference among the universities attended by students:

Result2 <- kplaces(data = auc, col.elements = "Name", col.sets = "University", k = 1)
Result2 <- kplaces(auc, "Name", "University", 1) # shorter version, same results


From the initial edge list of 418 students and 164 universities, 219 places and 2 k-places (or “ambiguous cases”) were found.

The k-places() function returns a list with four data frames:

  1. The original two-column data frame and the column “Places” with places labels.
  2. A data frame containing information about places and k-places.
  3. A data frame with the relation of places merged to k-places and the sets in common.
  4. The network of places in a two-mode edgelist format.

The data frame (2) containing information about places and k-places includes the following features:

  • PlaceLabel: contains places and k-places labels. Places labels start with P, followed by the place number, the number of elements in place and the number of sets defining place. K-Places labels start with P, followed by the k-place number, an *, the number of elements in k-place, the number of sets in common, and the value of k.
  • NbElements contains the number of elements in the place or k-place
  • NbSets contains the number of sets defining the place or k-place
  • PlaceDetail contains the name of all the elements in the place or k-place and all the sets defining the place or k-place

Let’s extract the information about places and k-places:

Result2_df <- as.data.frame(Result2$KPlacesData) 

kable(head(Result2_df), caption = "First 6 places/kplaces") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
First 6 places/kplaces
PlaceLabel NbElements NbSets PlaceDetail
P001(1-4) 1 4 {Lacy_Carleton} - {Columbia;Garrett Biblical Institute;Northwestern;Ohio Wesleyan}
P002(1-4) 1 4 {Luccock_Emory W.} - {McCormick Seminary;Northwestern;Wabash;Wooster}
P003(1-4) 1 4 {Ly_J .Usang} - {Columbia;Haverford;New York University;Pennsylvania}
P004(1-4) 1 4 {Pott_Francis L. Hawks} - {Columbia;General Theological Seminary;Trinity;University of Edinburgh}
P005(1-3) 1 3 {Chu_Fred M.C.} - {Chicago;Pratt Institute;Y.M.C.A. College}
P006(1-3) 1 3 {Chung_Elbert} - {Georgetown;Pennsylvania;Southern California}


Focus on k-places and identify the sets they have in common:

Result2k_df <- as.data.frame(Result2$kPlaces) 

kable(Result2k_df, caption = "Kplaces, corresponding places and common sets") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
Kplaces, corresponding places and common sets
k_places Places Common_Sets
380 P007*(2-3-2-1) P007(1-3) Columbia,Pomona
384 P007*(2-3-2-1) P085(1-2) Columbia,Pomona
45 P017*(2-3-2-1) P017(1-3) Chicago,Columbia
87 P017*(2-3-2-1) P072(1-2) Chicago,Columbia


The 2 k-places identified contain 2 elements (students) and 3 sets (universities). They have 2 sets in common and one difference:

  • P007*(2-3-2-1) includes F. Sec Fong and Edward Y.K. Kwong who both attended Columbia University and Pomona College. They differ in that Fong F. Sec also attended the University of California, whereas Edward Y.K. Kwong did not.
  • P017*(2-3-2-1) includes H.C.E. Liu and Jui-Ching Hsia who both attended the University of Chicago and Columbia University. Additionally, H.C.E. Liu studied at Denison University, whereas Jui-Ching Hsia did not.

5 Conclusion

This tutorial has demonstrated the significant potential of a place-based approach to two-mode networks. It has laid the foundations for a standard workflow based on the “Places” R package, which can be reused and adapted for other research across diverse disciplinary fields. This method can be applied to virtually any type of nodes, not only human and social actors, but also objects, concepts, and other abstract entities. Furthermore, it can be extended to multimodal networks involving more than two different types of nodes. Ultimately, we hope this document will inspire innovative research based on this framework.

Bibliography

ARMAND, Cécile, 2024. Bonding minds, bridging nations: Sino-American alumni networks in the Era of Exclusion (1882-1936). In : HENRIOT, Christian et WU, Jen-Shu (éd.). Berlin : De Gruyter.
BLONDEL, Vincent D., GUILLAUME, Jean-Loup, LAMBIOTTE, Renaud et LEFEBVRE, Etienne, 2008. Fast unfolding of communities in large networks. In : Journal of Statistical Mechanics: Theory and Experiment [en ligne]. octobre 2008. Vol. 2008, n° 10, pp. P10008. [Consulté le 28 octobre 2023]. DOI 10.1088/1742-5468/2008/10/P10008. Disponible à l'adresse : https://dx.doi.org/10.1088/1742-5468/2008/10/P10008.
BORGATTI, Stephen P., 2009. Two-Mode Concepts in Social Network Analysis. In : Encyclopedia of complexity and system science. 2009. Vol. 6, pp. 8279‑8291.
BORGATTI, Stephen P et EVERETT, Martin G, 1997. Network analysis of 2-mode data. In : Social Networks. 1997. Vol. 19, n° 3, pp. 243‑269.
BORGATTI, Stephen P. et HALGIN, Daniel S., 2011. Analyzing Affiliation Networks. In : The Sage Handbook of Social Network Analysis [en ligne]. S.l. : SAGE Publications Ltd. [Consulté le 21 août 2023]. ISBN 978-1-4462-9441-3. Disponible à l'adresse : https://doi.org/10.4135/9781446294413.n28.
EVERETT, M. G. et BORGATTI, S. P., 2013. The dual-projection approach for two-mode networks. In : Social Networks [en ligne]. 2013. Vol. 35, n° 2, pp. 204‑210. [Consulté le 21 août 2023]. DOI 10.1016/j.socnet.2012.05.004. Disponible à l'adresse : https://www.sciencedirect.com/science/article/pii/S0378873312000354.
FIELD, Sam, FRANK, Kenneth A., SCHILLER, Kathryn, RIEGLE-CRUMB, Catherine et MULLER, Chandra, 2006. Identifying positions from affiliation networks: Preserving the duality of people and events. In : Social Networks [en ligne]. 2006. Vol. 28, n° 2, pp. 97‑123. [Consulté le 21 août 2023]. DOI 10.1016/j.socnet.2005.04.005. Disponible à l'adresse : https://www.sciencedirect.com/science/article/pii/S0378873305000341.
NEWMAN, M. E. J., STROGATZ, S. H. et WATTS, D. J., 2001. Random graphs with arbitrary degree distributions and their applications. In : Physical Review E [en ligne]. juillet 2001. Vol. 64, n° 2, pp. 026118. [Consulté le 21 août 2023]. DOI 10.1103/PhysRevE.64.026118. Disponible à l'adresse : http://arxiv.org/abs/cond-mat/0007235.
PIZARRO, Narciso, 2002. Appartenances, places et réseaux de places. La reproduction des processus sociaux et la génération d’un espace homogène pour la définition des structures sociales. In : Sociologie et sociétés. 2002. Vol. 31, n° 1, pp. 143‑161.
PIZARRO, Narciso, 2007. Structural Identity and Equivalence of Individuals in Social Networks. In : International Sociology. 2007. Vol. 22, n° 6, pp. 767‑792.
SHANGHAI, American University Club of, 1936. American University Men in China. Shanghai : Comacrib Press.
UZZI, B. et SPIRO, J., 2005. Collaboration and Creativity: The Small World Problem. In : American Journal of Sociology. 2005. Vol. 111, pp. 447.
ZHOU, Tao, REN, Jie, MEDO, Matús et ZHANG, Yi-Cheng, 2007. Bipartite network projection and personal recommendation. In : Physical review. E, Statistical, nonlinear, and soft matter physics. octobre 2007. Vol. 76, pp. 046115.

Annexes

Info session

setting value
version R version 4.2.2 (2022-10-31)
os Rocky Linux 8.8 (Green Obsidian)
system x86_64, linux-gnu
ui X11
language (EN)
collate fr_FR.UTF-8
ctype fr_FR.UTF-8
tz Europe/Paris
date 2023-10-28
pandoc 3.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
package ondiskversion source
dplyr 1.0.10 CRAN (R 4.2.1)
forcats 0.5.2 CRAN (R 4.2.2)
ggplot2 3.4.0 CRAN (R 4.2.2)
igraph 1.3.5 CRAN (R 4.2.2)
kableExtra 1.3.4 CRAN (R 4.2.2)
Places 0.2.3 local
purrr 0.3.5 CRAN (R 4.2.1)
readr 2.1.3 CRAN (R 4.2.1)
stringr 1.5.0 CRAN (R 4.2.1)
tibble 3.2.1 CRAN (R 4.2.2)
tidyr 1.2.1 CRAN (R 4.2.1)
tidyverse 1.3.2 CRAN (R 4.2.2)

Citation

Armand C (2021). “A place-based approach to two-mode networks.”, doi:10.48645/xxxxxx https://doi.org/10.48645/xxxxxx,, https://rzine.fr/publication_rzine/xxxxxxx/.

BibTex :

@Misc{,
  title = {A place-based approach to two-mode networks},
  subtitle = {Unlocking two-mode networks with the “Places” R package},
  author = {Cécile Armand},
  doi = {10.48645/xxxxxx},
  url = {https://rzine.fr/publication_rzine/xxxxxxx/},
  keywords = {FOS: Other social sciences},
  language = {fr},
  publisher = {FR2007 CIST},
  year = {2021},
  copyright = {Creative Commons Attribution Share Alike 4.0 International},
}


Glossary


  1. Two-mode network: A specific kind of network that involves two different types of nodes, such as persons and organizations. Such networks are also refer to as affiliation networks or bipartite graphs.↩︎

  2. Place: In a two-mode network, a place refers to an assemblage of type-1 nodes that are associated with the exact same set of type-2 nodes. For example, in an affiliation network linking students with the universities they attended, two or more students form a place if they attended the exact same set of one or more universities.↩︎

  3. Structural equivalence: Two actors in a network are structurally equivalent if they have exactly the same ties to exactly the same other individual actors.↩︎

  4. Regular equivalence: Two actors are regularly equivalent if they are equally related to equivalent others. That is, regular equivalence sets are composed of actors who have similar relations to members of other regular equivalence sets. It correspond quite closely to the sociological concept of a role.↩︎

  5. Edge list: An edge list is a data structure used in network analysis to represent a graph as a list of its edges↩︎

  6. Modularity score: In network analysis, the modularity score measures the strength of a clustering method on a scale ranging from −0.5 to 1. It indicates how well groups have been partitioned into clusters. It compares the relationships in a cluster compared to what would be expected for a random (or other baseline) number of connections. Modularity measures the quality (i.e., presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. The modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.↩︎

  7. Regular equivalence: Two actors are regularly equivalent if they are equally related to equivalent others. That is, regular equivalence sets are composed of actors who have similar relations to members of other regular equivalence sets. It correspond quite closely to the sociological concept of a role.↩︎